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(57) Abstract 

An invention relates to a data processing apparatus for executing instructions of a program having a first instruction decoder, an 
address decoder, a plurality of computational units, and an execution logic unit. The data processing apparatus is characterized in that 
said first instruction decoder discriminates whether said apparatus is to execute a referential instruction which initiates execution of an 
instruction of a different type. The invention further relates to a method of executing instructions for data processing apparatus which 
method is characterized in that upon decoding a referential instruction the steps of fetching an instruction of a different type according to 
information included in a referential instruction and decoding said instruction of said different type for determining the operations to be 
executed in parallel are carried out 
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Description 

An apparatus for and a method of executing instructions of a 
program 

5 

The invention relates to a data, processing apparatus for exe- 
cuting instructions of a program according to the pre- 
characterizing portion of Claim 1. The invention further re- 
lates to a method of executing instructions for a data proc- 
10 essing apparatus according to the pre-characterizing portion 
of Claim 8. 

Currently, there exist two main architectures for DSP proces- 
sors. Both architectures make trade-offs between processing 

15 speed and program memory size wherein either the former or 

the latter enjoys the greater benefit. The first main archi- 
tecture may also be called a regular machine which means that 
one single instruction is executed per machine- cycle. The 
second architecture is generally called a VLIW architecture 

20 (very long instruction word). With the VLIW architecture, 

several instructions . are executed within one single machine- 
cycle . 

A regular machine executing a single instruction per machine- 
25 cycle features a relatively, small program data bus. Typi- 
cally, such a program data bus is 32 bits wide. In a DSP 
processor environment, the number of computational units in 
the execution unit of the processor is typically smaller com- 
pared to the second above mentioned architecture. The program 
30 data bus width and the number of computational units are di- 
rectly proportional to the power consumption of the proces- 
sor. Thus, a regular processor architecture typically con- 
sumes less power than other advanced architectures. However, 
the major disadvantage of the regular architecture consists 
35 in that the number of MIPS (mega instructions executed per 
second) is smaller compared to the above mentioned VLIW ar- 
chitecture. 
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A regular machine is for example described in US patent 
5,163,139 entitled "Instruction Preprocessor for Condition- 
ally Combining Short Memory Instructions into Virtual Long 
5 Instructions". This regular machine comprises two computa- 
tional units and a main program memory of regular program 
data width. The machine proposed in this patent further com- 
prises an instruction preprocessor unit which checks whether 
or not two subsequent instructions in the program memory can 

10 be validly combined so as to form a new instruction word in 
its own. This new instruction word is then interpreted and 
executed by the two computational units of the machine. The 
machine of US patent 5,163,139 is limited in that it can only 
combine pairs of instructions which meet predefined criteria. 

15 Thus, the machine largely constrains a programmer in develop- 
ing program code. 

The second architecture (VLIW) as mentioned above is based on 
an instruction set philosophy in which the compiler packs a 

20 number of simple, non-interdependent operations into the same 
instruction word. This type of architecture has been proposed 
originally by J. A. Fisher in "Very Long Instruction Word Ar- 
chitectures and the ELI-512" in Proceedings of the 10th An- 
nual Symposium on Computer Architecture, June 1983. The VLIW 

25 architecture assumes multiple computational units in the 

processor and several decoding units which analyse the in- 
structions fetched from the program memory. A VLIW architec- 
ture has the advantage that several operations are executed 
in parallel, thus increasing the MIPS performance of the 

30 processor. However, a VLIW processor requires a program mem- 
ory of a larger bit width. This is a burden for both the chip 
area required to implement the processor architecture and for 
its power consumption. Also, the programming skills required 
from a programmer are inherently higher for writing code for 

35 a VLIW processor since it requires to take into account the 
parallelism of the processor. 
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A particular VLIW processor has been proposed in US patent 
5, 450, 556 entitled "VLIW Processor Which Uses Path Informa- 
tion Generated by a Branch Control Unit to Inhibit Operations 
Which Are Not on a Correct Path". This patent proposes a so- 
5 lution for efficiently dealing with jump instructions in a 
VLIW program. In order to overcome this problem, it is pro- 
posed to add a path expression field to the VLIW instruction. 
This path expression field is read by a branch control unit 
in the processor which operates so as to speed up conditional 

10 branch operations. As with all previous VLIW processor archi- 
tectures, the structure proposed in US patent 5,450,556 suf- 
fers from the relatively large program memory required to 
store VLIW instructions, particularly in the case of execu- 
tion steps which only allow for a small degree of parallel- 

15 ism. 

The invention is based on the problem that highly parallel 
computer architectures demand a large program memory space. 
The invention thus seeks tp lower the program memory demand 
20 while maintaining the processors' capability of executing in- 
structions in a highly parallel manner. 

The problem is solved with a data processing apparatus having 
the features of claim 1. The problem is also solved by method 
25 of executing instructions for a data processing apparatus 

having the features of claim 8. Advantageous embodiments of 
the inventive apparatus and the inventive method are de- 
scribed in the respective dependent claims. 

30 A preferred data processing apparatus for executing instruc- 
tions of a program comprises a first instruction decoder, an 
address decoder, a plurality of computational units and an 
execution logic unit. The first instruction decoder sequen- 
tially fetches program instructions of a first type from a 

35 first program memory and decodes instructions of said first 
type. The address decoder determines the address of data to 
be loaded from or returned to a data memory. Each of said 
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plurality of computational units executes operations upon 
data according to the interpretation of said first instruc- 
tion decoder and provides the results of these operations. 
The execution logic provides said plurality of computational 
5 units with data and controls the operation of said plurality 
of computational units according to an instruction of said 
first type* The data processing apparatus is characterized in 
that said first instruction decoder discriminates whether 
said apparatus is to execute a referential instruction. The 
10 referential instruction then initiates the execution of an 
instruction of a second type. 

Thus, the data processing apparatus of the invention is capa- 
ble of executing two types of program instructions. Prefera- 

15 bly, the two types of instructions are of a significantly 

different bit width wherein instructions of said first type 
have the shorter bit width. Depending on the actual instruc- 
tion to be executed, the processing apparatus either executes 
an instruction word of a relatively short bit width or exe- 

20 cutes an instruction of a relatively large bit width. This 
allows for flexible program memory organisation and thus a 
reduction of: total memory demand of a particular program. 

A preferred embodiment of the inventive apparatus further 
25 comprises a second instruction decoder which fetches an in- 
struction of said second type. In an even more preferred em- 
bodiment, instructions of said second type are stored in a 
second program memory. Thus, said second instruction decoder 
fetches instructions of said second type from said second 
30 program memory and subsequently decodes instructions of said 
second type. 

By providing a separate memory unit for each of the instruc- 
tions of said first type and said second type, it is possible 
35 to store frequently used instructions of said second type and 
easily access these instructions by said data processing ap- 
paratus. Preferably the bit width of each of said first and 
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second program memory is set at a fixed length. Thus, the ar- 
chitecture of a preferred data processing apparatus is con- 
figurable to handle instructions of said first type and said 
second type in an efficient manner. 

A preferred embodiment of the inventive apparatus is further 
characterized in that an instruction of said second type com- 
prises a plurality of operators including data assignment in- 
formation of operands and data assignment information of re- 
sults. It is further preferred, that said execution logic 
comprises means for interpreting said instruction of said 
second type. 

In a particularly preferred embodiment of the invention said 
referential instruction includes address information. The ad- 
dress information relates to data upon which the instruction 
of said second type is to be executed which instruction of 
said second type is referred to in said referential instruc- 
tion. This preferred configuration of the inventive apparatus 
allows that data is fetched while the instruction of said 
second type is decoded. This can significantly increase the 
performance of the inventive apparatus. 

In a preferred embodiment of the apparatus of the invention, 
it is configured to allow for a pipe-lined execution of in- 
structions of any of the first or second type. This configu- 
ration particularly eases the simultaneous execution of op- 
erations . 

A preferred method of executing instructions for a data proc- 
essing apparatus comprises the steps of fetching an instruc- 
tion of a first type from a first program memory, decoding 
said instructions of said first type for determining the op- 
eration to be executed, reading operands from a data memory 
or from said data registers according to operands address in- 
formation included in said instruction of said first type, 
executing an operation upon said operands, and writing the 
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results of said operation into said data memory or into said 
data registers according to results address information in- 
cluded in said instruction of said first type. The inventive 
method is characterized in that upon decoding of a referen- 
5 tial instruction which includes predetermined information so 
as to be decoded as such the steps of fetching an instruction 
of a second type according to information included in said 
referential instruction and decoding said instruction of said 
second type for determining the operations to be executed in 
10 parallel are carried out. 

As has already been described with regard to the above men- 
tioned preferred data processing apparatus of the invention, 
the preferred method allows for a flexible usage of memory 
15 space because of the provision of two types of instructions. 
The additional information needed for carrying out a particu- 
lar parallel operation is obtained by referring to further 
instruction information (instruction of said second type) in 
an instruction of said first type. 

20 

A further preferred embodiment of the inventive method is 
characterized iri that said referential instruction includes 
address information which is decoded substantially at the 
time of decoding said referential instruction. This feature 
25 allows for a significant increase in processing speed because 
the data which is needed for the instruction, which is re- 
ferred to in the referential instruction, is loaded at the 
time of decoding the referential instruction. 

30 In an even more preferred embodiment of the method of the in- 
vention the steps of decoding a referential instruction and 
the step of fetching an instruction of said second type are 
executed substantially simultaneously wherein said referen- 
tial instruction and said instruction of said second type are 

35 associated with each other. This allows for an even further 

increase in processing speed because the information required 
for carrying out an instruction of said second type is pro- 
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vided already at the time of decoding said referential in- 
struction. 

In a still preferred embodiment of the method of the inven- 
5 tion said step of reading operands from a data memory and 

said step of decoding an instruction of said second type are 
executed substantially simultaneously wherein the operands 
read are associated with the instruction decoded. This pre- 
ferred feature allows for an even more increase in processing 
10 speed as now all information is available to the computa- 
tional units of the data processing apparatus for carrying 
out the operations according to the instruction of said sec- 
ond type. 



15 Further advantages, features and possibilities of using the 
invention are explained in the following description of a 
preferred embodiment of the invention which is to be read in 
conjunction with the attached drawings. In the drawings; 



20 Figure 1 depicts a circuit diagram of a preferred data 

processing apparatus according to the invention; 



25 



Figure 2a shows an example of the structure of a very long 
instruction word as is used in the prior art; 



Figure 2b 



shows the structures of instructions of two dif- 
ferent types as used in the preferred embodiment 
of the invention; 



30 Figure 3a is a table showing the sequence of pipe-lined in- 
structions in a data processing apparatus of the 
prior art; and 



Figure 3b 

35 



is a table showing the sequence of instructions 
according to a preferred embodiment of the inven- 
tion . 
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Figure 1 shows the basic architecture of the preferred em- 
bodiment of a data processing apparatus according to the in- 
vention. The data processing apparatus which is particularly 
apt for digital signal processing is configured for a paral^ 
5 lei execution of several operations and thus comprises a plu- 
rality of computational units. In the preferred embodiment, 
there are provided four computational units which are as- 
signed reference numerals 61 to 64. Each of the computational 
units 61 to 64 is provided with operands data from an execu- 

10 tion logic unit 7. Each of the computational units on the 
other hand delivers the result of a computation to one or 
more registers of a bank 5 of multiport registers and/or to a 
data memory 3 through a data bus line connecting said compu- 
tational units 61 to 64 to said data memory 3, said data bus 

15 having a bit width of r bits. In the preferred embodiment two 
results may be written directly into said data memory having 
a data bit width of 16 bits. Thus, the bit width r equals 2 x 
16 bits. 

20 The contents of each of said multiport registers 5 is fed 
back through a bus line of bit width n to said execution 
logic unit 7. The contents of said multiport registers 5 is 
also provided to an address decoder 4 for selectively writing 
data from said multiport registers 5 into said data memory 3. 

25 The multiport registers 5 are therefore connected to said ad- 
dress decoder 4 through a bus line also having a bit width of 
n bits. In the preferred embodiment each register has a data 
bit with of 16 bit. Further, the bank 5 of multiport regis- 
ters comprises a total of 16 registers. Thus, n is set to 16 

30 x 16 bit = 256 bits in the preferred embodiment of the data 
processing apparatus . 

With this kind of configuration of the preferred embodiment, 
the data processing apparatus of the invention can be oper- 
35 ated either as a register-memory architecture machine or a 

memory-memory architecture machine. On the one hand, the exe- 
cution logic unit 7 not only receives data from said mul- 
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tiport registers 5 but also directly from said data memory 3. 
On the other hand, the computational units 61 to 64 not only 
write to said multiport registers 5 but also directly to said 
data memory. It is clear to a person skilled in the art that 
5 the invention can similarly be embodied in a load-store ar- 
chitecture (or alternatively called register-register archi- 
tecture) machine without deviating from the scope of this in- 
vention. 

10 As already mentioned above, the executibn logic unit 7 not 
only receives operands data from said multiport registers 5 
but also from said data memory 3 through a bus line having a 
bit width of o bits. The bit width o of the data bus between 
said data memory 3 and said execution logic unit 7 is propor- 

15 tional to the number of operands to be loaded from said data 
memory 3 and the bit width of each operand. In the preferred 
embodiment there are loaded a maximum of four operands from 
said data memory 3 to said execution logic unit 7, each hav- 
ing a bit width of 16 bits, resulting in a bus width o of 4 x 

20 16 bits = 64 bits. 

The execution logic unit 7 receives decoded instruction in- 
formation from a regular instruction decoder 1 . The execution 
logic unit 7 thus receives the operands for carrying out a 

25 particular instruction from said multiport registers 5 and/or 
said data memory 3 and delivers them to said computational 
units 61 to 64 as indicated by the decoded regular instruc- 
tion. The execution logic unit 7 further comprises means 8 
for receiving a decoded instruction from a CLIW instruction 

30 decoder 9 (Configurable Length Instruction Word) - Once a de- 
coded CLIW instruction is received, said receiving means 8 in 
said execution logic unit 7 makes sure that the execution is 
not carried out according to information received from said 
regular instruction decoder 1 but exclusively according to 

35 the decoded instruction as received from said CLIW instruc- 
tion decoder 9. Thus, said receiving means 8 replaces all the 
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information from said regular instruction decoder 1 with the 
information received from said CLIW instruction decoder 9. 

Said regular instruction decoder 1 receives a line of code 
5 from a regular program memory 2 for decoding the instruction 
encoded therein. For sequential operation of the data proc- 
essing apparatus, the regular program memory 2 is addressed 
by the output of a program counter 15. The regular instruc- 
tion decoder 1 delivers decoded instruction information to 
10 said execution logic unit 7 and delivers an address encoded 
in a particular instruction to said address decoder 4. The 
regular instruction decoder 1 is further connected to said 
CLIW instruction decoder 9 for indicating the fact that a 
CLIW instruction is to be decoded next. 

15 

The address decoder 4 receives address information from said 
regular instruction decoder 1 for decoding the address en- 
coded in a particular instruction. The decoded address is de- 
livered through a bus line having a bit width of m bit to 

20 said data memory 3. The bit width m is proportional to the 

number of addresses and the number of bits per address to be 
addressed at a time. In the preferred embodiment, the address 
decoder 4 decodes four addresses each having a bit width of 
16 bits thus resulting in a bit width m of 64 bits for the 

25 bus line connecting said address decoder 4 and said data mem- 
ory 3. Said data memory 3 is further connected to said regu- 
lar instruction decoder 1 through lines R/W indicating to 
said data memory 3 whether data at specified addresses is to 
be read from or is to be written into. 

30 

Said CLIW instruction decoder 9 is connected to a CLIW memory 
10 having stored therein lines of code representing CLIW in- 
structions. The particular instruction to be read from said 
CLIW memory 10 is indicated by said regular instruction de- 
3 5 coder 1 through a line P connecting said regular instruction 
decoder 1 to said CLIW memory 10. Thus, the regular instruc- 
tion decoder 1 points to a particular storage location of 
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said CLIW memory 10, the CLIW instruction being stored 
therein is to be delivered to said CLIW instruction decoder 
9. 

The general operation of the preferred embodiment of the in- 
vention can be described as follows. The execution logic unit 
7 operates according to instructions sequentially read from 
said regular program memory 2. As long as said regular in- 
struction decoder 1 does not decode a special instruction, 
the operation of said CLIW instruction decoder 9 and said 
CLIW memory 10 is practically inhibited. However, once said 
r.egular instruction d ecode r^ 1 decodes a^special instructio n 2 
TwfrtcK^also can be called a referential instruction) , the 



c fuhc tion of said CLIW^jjri struct iojx^deco der 9 an d said^CLIW 
memory 10 is activated. In jiffect/ the ex ecution logic unitT~~"7 
thjgn exclusively operates according to information received — ' 



f rom said CLIW instruction decoder 9 instead of info rmation 

received f rom said regular instruction de coder 1. 
^ ■ ■ ■ _— — 

In the preferred embodiment of the invention the mentioned 
special instruction from said regular program memory 2 con- 
tains address information which the regular instruction de- 
coder 1 delivers to said address decoder 4. In order for the 
data processing machine to execute such a special instruc- 
tion, instruction information from said special instruction 
and instruction information from an associated CLIW instruc- 
tion are combined. 



Figure 2a shows the typical structure of a very long instruc- 
tion word according to the prior art. The instruction word 14 
of figure 2a basically consists of four segments. In a first 
segment, a plurality of operations are defined. In a second 
segment, operands are assigned to each of these operations. 
In a third segment, results are assigned to each of these 
segments. Finally in a fourth segment, memory addresses are 
defined for the operands and results assigned in said second 
and said third segments, respectively. 
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Figure 2b shows the structure of instruction words which are 
used in conjunction with the invention. There is shown a 
regular (short) instruction 11 having a length of k bits. A 
5 regular instruction 11 includes an instruction header con- 
taining an operation code (op code) which defines the type of 
instruction. Figure 2b further shows the structure of a ref- 
erential instruction 12 which also has a length of k bits. A 
special op code is stored in the op code header of the refer- 

10 ential instruction 12 which op code distinguishes the refer- 
ential instruction 12 from other regular instructions 11. The 
referential instruction 12, also includes a plurality of mem- 
ory addresses upon which a particular referential instruction 
is to be executed. Finally the referential instruction in- 

15 eludes a pointer P which points to a CLIW instruction. 

Figure 2b also shows the structure of a CLIW instruction 13. 
The structure is basically identical with the one of a VLIW 
instruction 14 according to figure 2a except that a CLIW in- 
20 struction 13 does not include any memory addresses. In fact, 
the addresses for a particular CLIW instruction are included 
in a referential instruction 12 which points through its 
pointer P to a particular CLIW instruction 13. A CLIW in- 
struction is shown to have a bit length of 1 bits. 

25 

Whereas regular instructions 11 and referential instructions 
12 are stored in the regular program memory 2, CLIW instruc- 
tions are stored in the CLIW memory 10. Thus, the regular 
program memory 2 and the CLIW memory 10 are configured with 

30 the respective bit length of the instruction words stored 

therein. In the preferred embodiment, regular instructions 11 
and referential instructions 12 have a bit length of 4 8 bits. 
On the other hand, CLIW instructions 13 have a bit length of 
96 bits. While the regular instruction decoder 1 is sequen- 

35 tially and continuously decoding instructions from said regu- 
lar program memory 2, additional instruction information from 
said CLIW memory 10 will only be supplied to said execution 
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logic unit 7 when said regular instruction decoder 1 decodes 
a referential instruction. At this point, the decoded in- 
struction from the CLIW instruction decoder 9 is fed to the 
receiving means 8 of the execution logic unit 7 for replacing 
5 all information which would be normally supplied by said 
regular instruction decoder 1 . 

Figure 3a is a table showing the execution of normal VLIW in- 
structions in a processor with 5 stage pipeline according to 
10 the prior art. The table of figure 3a shows the steps of in- 
struction fetch, instruction decode, operand read, execution 
and operand write . 

Figure 3b is a table showing the pipelined execution of a 

15 program according to the invention. In terms of processing 
regular program instructions, the sequence of operations is 
identical with the one as shown in the table of figure 3a. In 
case a referential instruction is encountered however, two 
additional steps are inserted. At the time of decoding a 

20 regular instruction, which is decoded as being a referential 
instruction, the CLIW instruction referred to in the referen- 
tial instruction is fetched. See for example the line having 
line header "instruction decode and CLIW fetch" between ma- 
chine cycle 2 and machine cycle 6. Also, at the time operands 

25 are read from memory, the CLIW instruction fetched in the 

previous machine cycle is decoded. This is possible because 
the referential instruction 12 contains all address informa- 
tion to read the needed operands. A referential instruction 
12 contains a pointer to a particular CLIW instruction which 

30 is to be fetched and decoded so as to be executed with data 
to be read. It is referred to the table in figure 3b to the 
line having line header "operand read and CLIW decode" be- 
tween machine cycles 3 and 7. The sequence of operations car- 
ried out in pipeline for a particular instruction follows a 

35 diagonal line in the table as indicated by an arrow. 
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A prior art processor which controls multiple execution units 
in parallel by one VLIW instruction usually requires large 
program memory space for the optimum usage of parallel execu- 
tion of the data processing apparatus- The invention re- 
5 stricts the usage of long instructions to very time consuming 
parts of an algorithm, the so called inner loops. Thus fre- 
quently executed instructions are executed in a highly paral- 
lel fashion while significantly decreasing the required mem- 
ory space for program code for instructions which cannot be 

10 carried out in parallel ♦ The code of a VLIW instruction of 
the prior art determines for each execution step the opera- 
tion codes, the operand assignments, the output assignments 
and the memory addresses. The great variety of such configu- 
rations results in a high bit width of each of the VLIW in- 

15 structions. Although VLIW instructions offer full coding 

flexibility for each execution step and thus always support 
maximum parallelism, the program code consumes a large amount 
of program memory, particularly for those execution steps 
which do not allow full parallel operation* 

20 

Typical programs for digital signal processors generally con- 
sist of inner loops in which few instructions are repeated 
very often. The instructions in an inner loop should be sup- 
ported by maximum parallelism of the digital signal processor 
25 because they can reduce the . required run time to a large ex- 
tent. 

The invention solves this problem by using short instructions 
combined with conf igurable length instruction words (CLIW) . 
30 Thus, the invention offers the advantage of maximizing the 

execution efficiency of inner loops and limited program space 
for program code outside these inner loops. 

The regular instructions outside the inner loops are executed 
35 sequentially. A regular instruction is directed to only cer- 
tain frequent connections and operations of the execution 
units and the necessary operands. All regular instructions 
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are directly fetched from the regular program memory 2. Addi- 
tionally, CLIW instructions are stored in a dedicated CLIW 
memory 10. A special referential instruction is used for ini- 
tiating the execution of CLIW instructions. The referential 
5 instruction loads a CLIW instruction from the CLIW memory 10. 
The address P of the CLIW instruction ' to be fetched is de- 
fined by the referential instruction. 

A CLIW instruction 13 defines all possible types of opera- 
10 tion, operand connections and output connections. The refer- 
ential instruction includes all required memory addresses for 
the o^ratipns defined in the CLIW instruction associated 
therewith. Thus, the referential instruction together with 
its associated CLIW instruction has all the information that 
15 a VLIW instruction according to the prior art requires. 

Since the bit width of regular program instructions (and thus 
also of referential instructions) is preferably configured to 
be significantly lower than the bit width of CLIW instruc- 
20 tions, it is possible to write a much more compact program 
code than with VLIW instructions only. 

The program code for each execution of the same CLIW instruc- 
tion includes just another regular (short) referential in- 

25 struction 12. Since typically the type of parallel operations 
and connections do not change within a set of CLIW instruc- 
tions (for example for the execution of matrix operations) it 
is possible to save program space for CLIW instructions by 
simply changing the memory address in the referential in- 

30 struction. 

It is thus preferred to specify the memory address of oper- 
ands within a referential instruction 12 independent of the 
reference to a particular CLIW instruction 13. This does not 
35 only allow for using different memory operands with the same 
CLIW instruction but this also speeds up the instruction flow 
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execution within the processor when using a pipelined execu- 
tion. 

The number ■ ©-f required CLIW instructions in the inner loops 
depends on the actual program. There is a possibility to ex- 
tend the fixed number of available CLIW instructions in a 
CLIW memory. After initialization, the CLIW memory can be dy- 
namically reconfigured by recalling referential instructions. 
Different packets of CLIW instructions can be used in differ- 
ent parts of an algorithm. This feature is enabled by reload- 
ing CLIW memory packets at run time. 

The size of the CLIW memory is user definable. Usually the 
size of the CLIW memory will be much smaller than the program 
memory. Those parts of the CLIW memory which contain always a 
constant set of CLIW instructions can be implemented as a 
read-only-memory (ROM) . CLIW instructions which are encoded 
in a ROM can still be called together with data at different 
memory addresses because the address information is included 
in the referential instruction. 
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What we claim is: 

1. A data processing apparatus for executing instructions 
of a program comprising a plurality of instructions, said ap- 

5 paratus having: 

- a first instruction decoder (1) for sequentially fetching 
program instructions (11) of a first type from a first 
program memory (2) and for decoding instructions of said 

10 first type; 

- an address decoder (4) for determining the address of 
data to be loaded from or written to a data memory (3) ; 

15 - a plurality of computational units (61, 62, 63, 64) for 

executing operations upon data according to the interpre- 
tation of said first instruction decoder (7) and for pro- 
viding the results of these operations; 

- an execution logic unit (7) for providing said plurality 
of computational units (61, 62, 63, 64) with data and for 
controlling the operation of said plurality of computa- 
tional units (61, 62, 63, 64) according to an instruction 
(11) of said first type; 

characterized in that 

said first instruction decoder (1) discriminates whether 
said apparatus is to execute a referential instruction (12) 
which initiates execution of an instruction (13) of a second 
type. 

2. The apparatus according to claim 1, 
35 characterized by 



20 



25 
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a second instruction decoder (9) for fetching an instruction 
(13) of said second type and for decoding an instruction (13) 
of said second type. 

5 3. The apparatus according to claim 2 or 3, 

characterized in that 

said instruction (13) of said second type comprises a piurai- 
10 ity of operators including data assignment information of op- 
erands and data assignment information of results, 

4. The apparatus according to any of the previous claims, 

15 characterized in that 

said execution logic (7) comprises means (8) for interpreting 
instructions (13) of said second type. 

20 5. The apparatus according to any of the previous claims, 

characterized in that 

said referential instruction (12) includes address informa- 
25 tion of data upon which said instruction (13) of said second 
type is to be executed. 

6. The apparatus according to any of the previous claims, 
30 characterized by 

said apparatus is configured to allow for a pipe-lined execu- 
tion of instructions (11, 12; 13) of any of the first or sec- 
ond type . 

35 

7. The apparatus of any of claims 2 to 6, 



WO 99/42922 



PCT/EP99/00849 



10 



19 

characterized in that 

instructions (13) of said second type are stored in a second 
program memory (10) . 

8. A method of executing instructions for a data processing 
apparatus comprising a plurality of computational units (61, 
62, 63, 6n) which can be operated in parallel, and data 

registers (5), the method comprising the steps of: 

- fetching (IF1, IF2, IF5) an instruction (11) of a 
first % type from a firs.t program memory (2) ; 

- decoding (ID1, ID2, IDS) said instructions (11) of 
15 said first type for determining the operation to be exe- 
cuted; 

- reading (OR1, OR2, OR5) operands from a data memory 
(3) or from said data registers (5) ; 

20 

- executing (El, E2, E5) an operation upon said oper- 
ands ; and 

- writing (OW1, OW2, OW5) the results of said opera- 
25 tion into said data memory (3) or into said data regis- 
ters (5) ; 

characterized in that 

30 upon decoding of a referential instruction (12), which in- 
cludes predetermined information so as to be decoded as such, 
the following steps are performed: 



35 



- fetching (CF1, CF2, . . . , CF5) an instruction (13) of a sec- 
ond type according to information included in said refer- 
ential instruction (12); and 
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- decoding said instruction (12) of said second type for 
determining the operations to be executed in parallel. 

9. The method according to claim 8, 

5 

characterized in that 

said referential instruction (12) includes address informa- 
tion including operands addresses and results addresses which 
10 information is decoded substantially at the time of decoding 
said referential instruction (12) . 

10. The method according to any of claims 8 or 9, 

15 characterized in that 

said steps of decoding a referential instruction (12) and 
said step of fetching an instruction (13) of said second 
type, which is associated with a particular referential in- 
20 struction (12) , are executed substantially simultaneously. 

11. The method according to any of claims 8 to 10, 
characterized in that 

25 

said step of reading operands from a data memory (3) and said 
step of decoding an instruction (13) of said second type, 
which is associated with said operands, are executed substan- 
tially simultaneously. 

30 

12. The method according to any of claims 8 to 11, 

characterized in that 

35 said method is carried out by a data processing apparatus in 
a pipe-lined manner. 
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